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ABSTRACT 


The neuion model pioposed by McCulloch & Pitts has a combination 
of aggiegation and activation functions this model lequiies a laige no of neuions 
in the standaid neviial netwoik to solve any pioblem to oveicome this difficulty 
compensatoiy neuion models have been pioposed which foim the basis of 
compensatoiy neuial netwoik aichitectuie a total of seven compensatoiy neuion 
models have been investigated in conjunction with selfscaling scaled conjugate 
giadieiit ilgoiithm the peifoimance of one neuion model has been compaied with 
the standaid neuial netwoik with scaled conjugate giadient learning algoiithm to 
show the efficacy of the compensatoiy model These compensatoiy models aie 
also compaied and discussed in the woik 
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CHAPTER 1 


Introduction 

1 1 An overview of Artificial Neural Networks 

Over the last one decade Artificial neural networks ( ANNs) have emerged as a new 
powerful tool for classification and fiinctional mappmg As universal approximators ANNs 
offer a systematic approach for these problems especially the problems which aie hard to 
analyze in full details In general neural networks can have four kinds of architecture [10] 
namely single layer feedforward Networks multiplayei feedforwaid netnoiks recta tent 
neiitaJ netuoiki and lattice neutal stiiichiiL Among them the most widely used 
architecture is feedforward neural networks and the present work deals with them A typical 
multiplayer feedforward neural network is shown m fig 1 1 multiplayer feedforward neural 
network have been applied successfully to solve some difficult and bench mark problems by 
training them 



FIG 1 1 General Feedforward ANN architecture 
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1 2 Existing neuron models 

Neural network models even neurobiological ones assume many simplifications over 
actual biological neural networks Such simplifications are necessary to understand the intend 
properties and to attempt any mathematical analysis In the following we discuss few models 
which are frequently used 

12 1 McCulloch -Pitts model 

McCulloch and Pitts modeled simple logical units called cells so as to 
represent and analyze the logic of situations that arise in any discrete process it s a simple 
two state machine[9] Each cell is a finite state machine and accordingly operates m discrete 
tune instants which is assumed to be synchionous among all cells It contains an aggregation 
and a binary transfer function as When sum of inputs exceeds some threshold value then the 
output of the neuron will be MAX otherwise MEN Fig 12 shows the basic structure of 
McCulloch neuron model and the corresponding equation gives the state of the output We 
can have unipolar and bipolar type of transfer functions 


Inputs 



Fig 1 2 Simple McCulIuch - Pitts Model 




122 McCulIuch - Pitts model with continuous Transfer function 

The structure of this model looks like the previous model except the activation 
function This will have a continuous activation function like linear sigmoidal gaussian tan 
hyperbolic etc Fig 2 3 shows the diagram of this model 


input 



output 


FIG 1 3 McCulloch New Model 


The equations for some of transfer functions are 
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1 3 The drawbacks existing neuron models ANN 


1 Number of neurons required m hidden layer are large for complex functions 

appioximations 

2 Number of hidden layers required for complicated function may be three or more Even 

though It has been reported that a three layer network can approximate any functional 
relations but the training time is too large for that 

3 Above mentioned constraints are not only computationally expensive mterms of 
convergence and large no of neurons m each layer but also determining &ult tolerant 
capabilities of the neural network Several iterations ai e required before the ANN is 
trained to give accurate results 

4 Size of the ANN decides total no of unknowns to be determined and hence minimum no 
of tiaming data( input - output pairs ) required for development of neural network 
model In case of complex functions the training data required is huge due to the 
requirement of large no of neurons and hidden layers 

5 The training time of the ANN depends on the input and output mappings like X ~ Y 
AX -Y X AY AX AY 

1 4 Suggested Remedies 

The following suggestions have been made m the present work to alleviate 

the above mentioned shortcomings 

1 Development of neuron models which are flexible to accommodate variations in its 
model and hence reduce the total number of neurons in the ANN 

2 These new neurons should also exhibit characteristics of existing neurons so that the 
models are general enough to accommodate the properties of the simple neurons to higher 
order neurons 

3) Total no of hidden layers required must also be reduced this would result m the neural 




network model which is computationally efficient 
4) The new neuron models should not require more data for training i e leduction in 
free 

parameters associated in the neurons 

1 5 Organization of Thesis 

Chapter 2 of this thesis contains the descnption of different types of 
new Neuron models of feed forward networks and respective derivations Chapter 3 
descnbes some of the second order learning algorithms Chapter 4 will give you a brief idea 
about the bench mark problems which I considered Chapter 5 presents the results for 
different models of neuial network with different problems Chapter 6 concludes the present 
work and giving aspects of further develoment of the present work 



Chapter 2 


2 1 Preliminary Remarks 

We have discussed the existing neuron models and their dements m the 
previous chapter It has been shown in literature that a three layer neural network can be 
a universal function approximator for a given mput output data The existmg neuron 
model has aggregation function with sigmoidal radial basis tangent hyperbolic or linear 
function as the activation fimction or nonlmeanty To overcome this deficiency these 
conventional neuron model may not be computationally efficient to meet many real life 
applications The proposed compensatoiy models have both sigmoidal and gaussian 
functions with weight sharmg The proposed new compensatory neurons have flexibility 
at both aggregation and threshold fimction level to cope with the non Imeanties The 
proposed neuron has both E and 11 aggregation functions The final output of the neuron 
IS function of two outputs ^ the weights -> 4 ;^ ^ vv'pj respectively Figure 2 1 

shows the general stmcture of the proposed compensatory neuron model 1 and Figure 
2 2 shows the complete architecture of CNNA with compensatory neurons m the hidden 
layer 



Fig 2 1 General compensatory Neuron Model 









2 2 Different Compensatory Neuron Models 


22 1 MODEL 0 

In this model both 2 as well as n have been taken as the aggregation fiinction and 
the output of these aggregation functions have been passed through the tan hyperbolic 
and the arctangent functions respectively Fmally the outputs of these activation functions 
are summed up to get the neuron output The output of the compensatory neuron can be 
written as 


Xj =^XPjWpj+Xpj Wpj 

Descnbed below are equations pertaimng to weights update for different layers 


Output layer weights update 
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Neuron block weights update 
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222 Model 1 


This model is similar to the model 0 The only difference is that m this 
model the weights associated with the output of the product aggregation function when 
passed through the arctangent function is (l - Wj- ) 

Hence the output of the compensatory neuron can be written as 


Xj = 


xpjWpj+xpj\l-Wpj) 
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and the equations for the weights update are 
Output layer weights update 

Neuron block weights update 

Weights in the neuron blocks are compensatoiy 
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Input layer weights update 
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Model 0 & Model 1 are known as summation type compensatoiy neuron models smce 
the output of the tan hyperbolic and arctangent functions have been added up 

2 2 3 Model 2 

In Model - 2 outputs of the activation functions model are multiplied after bemg 
exponentiated to the power Wp and wq 

Here the output of the neuron in the form of products is as given 

below 
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The output layer weight update is 

dE 


dWjg 

Neuron block weights update is 
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The input layer weights update 
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22 4 Models 


This neuron model has a complicated aggregation function which is neither 
summation fiinction nor a product function alone but a combination of two 
The output of this neuron model is 


xj =(; 


^^PJ'^^PJ ^PJ^PJ 


Output layer weights update 
dE 
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Neuron Block weights update 
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2 2 5 Model 4 


This model is similar to the Model 3 but the output of the neuron is in the 
product form as follows 
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Output layer weights update 
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Neuron Block weight update 
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Input layer weights update is 
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2 2 6 Model 5 


This IS also summation neuron model however it uses the arithmetic mean and 
geometnc mean of the output of the activation functions as shown below 


Xj = 


Xp +XPj 
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Output layer weight update is 
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Neuron Block weights update is 
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Input layer weights update is 
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22 7 Model 6 


This model is also similar to the Model 5 but the output is m the product form 
The output IS 
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The output layer weights update is 
dE 


dwt 


■ = -(dk ~Of^)xj ^SkXj 




The neuron Block weights update is 
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CHAPTER 3 


SECOND ORDER LEARNING ALGORITHMS 

3 1 Preliminary Remarks 

The first order learning algorithms have limitations such as divergence show 
convergence and degree of accuracy achieved is generally lower This may be attnbuted to 
gradient descent method which is only an approximation of tmncated Taylor series 
However another reason for the slow convergence of BP training is the occurrence of the 
phenomenon of premature saturation of ANN output units which is applicable to both the first 
and second order and second order learning when the units are mapped by sigmoid like 
functions [2] This is characterized by the temporary trapping of the AJSIN output units at 
saturated activation levels during the early stage of training process While thus trapped the 
saturated output units preclude any significant improvements m the training weights directly 
connected to these units causing unnecessary increase in the number of iterations required to 
tiain the ANN 

There may be cases in which learning speed of first order STD BP is a limiting 
factor in practical application of ANN to problems that require high accuracy in tlie ANN 
mapping function The problems that belong to these classes are lelated to system 
identification 

A nonlinear modeling time series prediction navigation manipulation and robotics In 
addition the STD BP [3] requires selection of appropriate parameters by the user that is 
mainly carried out by a trail and error process Since one of the competitive advantages of the 
neural networks is the ease with which they may be applied to novel or poorly understood 
problem it is imperative to consider automated and robust learning methods with a good 
average performance on many classes of problems Many adhoc schemes have been suggest 
improving the capability of first order learning schemes but some of them are problem 
specific while others simply increase the computational complexity while the improvement in 


learning may not be significant [4] To over come this limitation second order learning 
schemes have been suggested that have been showm to accelerate the convergence of the 
learning phase on a variety of problems The divergence of these algorithms is not completely 
ruled out as often the Hessian matnees become non positive definite In this chapter second 
order learning algorithms such as scaled conjugate gradient algorithm (SCGA) and self 
scaling scaled conjugate gradient (SSCGA) are described which tries to overcome some of 
these limitations 

3 2 Second Ordei Learning Algorithms 

The greatest difficulty in using the steepest descent gradient algorithm is that a 
one dimensional minimiz:ation in direction a followed by a minimization in direction b does 
not imply that the function is minimized on the sub space generated by a and b Minimization 
along direction b in general spoils a previous minimization along direction a On the contrary 
if the directions weie non mterfenng and linearly independent at the end of n steps the process 
would converge to the minimum of the quadratic function 
Q{w) = c^iT + 0 5fr^ GiT 

where G is symmetric and positive definite and N is the dimension of the weight space 

The concept of non interfering directions is the basis of conjugate gradient 
algorithm (CGA) for minimization But this method requires a line search scheme to find the 
step size to descend in a conjugate direction Unfortunately this line search may not work if 
the function being optimized is non quadratic as is the case fiequently encountered in the 
ANN In the following sections improved models of the CGA are described which avoid the 
Hessian calculation but are as good as any second order algorithm based on Hessian 
calculation 

3 21 Scaled Conjugate Gradient Algorithm 

SCGA which IS an improvement over the conjugate gradient algorithm (CGA) 
was developed my Moller in 1993 based on the Levenberg Marquardt approach in order to 
scale the step size [5] CGA is based on the fact that minimization of a positive definite 


16 


quadratic function is equivalent to solving a system of linear equations obtained on setting 
gradient to zero A detailed discussion of CGA can be found in the references [6] In CGA 
the step size is usually determined by one of the line search methods But in spite of such a 
sound theoretical backing it often fails and converges to non stationary points This may be 
attributed to the Hessian matrices becoming non positive definite The quadratic 
approximation which holds good only in the neighborhood of the current point may not hold 
for points outside [7] To overcome this limitation Moller proposed the SCGA by combining 
model trust region approach with the CGA [5] While formulating the ANN he retained non 
linearity in the output Here we remove the non linearity from the output and keep only the 
aggregation function as the output neurons and combine it with the SCGA learning and refer 
to It as SSCGA This may facilitate the scaling of the output in the higher ranges Besides 
giving higher accuracy it may also reduce significantly the number of iterations and hence 
computation Moreover it rules out the possibility premature saturation as can be seen in the 
orbit determination section where it is able to optimize with the outputs with out 
normalization as well as when normalized in the higher range where certainly the STD with 
steepest descent LG CGA or SCGA will not work 


3 3 Conjugate Gradient Algoiithm 

Conjugate direction method firom which conjugate gradient method is derived is 
thoroughly described in the reference [6] Here an outline of the CGA is described which is 
the basis for SCGA and the SSCGA Conjugate direction methods which are based on the 
pseudo optimization strategy described in the last chapter Choose the search direction and the 
step size more carefiilly by using information fi-om the second order approximation given by 

£'(iT + y)ai£'(u')+£' (wfy + 0 5y^ E 


We quote here two theorems to summanze the working of the conjugate gradient method 



Theorem # 1 let be a conjugate system and p, a point in the white space Let points 

y Tw ibe recursively defined by 3^^ , = where 

oc^ f^k -~Pi^ E q ( ) 5^ =-pjE )p Then y^ , minimizes restricted to the 

k plane given by Pj and p, p^ where £'(j)') » E{y)+E (iT )^ '} +0 5}'^ £ (iT )3; 

The conjugate direction theorem#! assumes that a conjugate system is given But this 
IS not necessary and the conjugate vectors p, p^ can be determined recursively This is 

achieved by setting p^ to this steepest descent vector -E q c) Then p^ ^ is determined 
recursively as a linear combination of the current descent vector - E q q ) and the previous 
diiection p^ Theorem#2 concludes the above statement in short 

Theorem # 2 Let y, be a point in the weight space and p, andi^ equal to the steepest 
descent vector - i? g ( i Define p^^, recursively by Pi i = + fikPk where 

[|P I I / 

7^^., = E q (y ) and , is the point generated in theorem#! 

/ Pk h 

Then p^ , is the steepest descent vector to restricted to the (N k) plane 
Conjugate to 11^ given by y, and Pj p^ 

The vectors defined by theorem are referred to as conjugate direction and theorems 1 
and 2 combined together give the CGA Error will converge in just N steps if the error 
function IS strictly is a quadratic function 

Scaled conjugate gradient algonthm The line search technique that CGA uses for 
determining the optimum step size is a costly computational affair Needless to say CGA may 
fail in general and may converge to non stationary point because the algorithm works only 
with positive definite Hessian matrices and the quadratic approximations on which it is based 
may not hold when the current point is far from the desired minimum This problem is tackled 
by combining the model test region approach known as Levenberg Marquardt algonthm with 
the CGA The algorithm for updating the weights using the scaled conjugate gradient 
algorithm (SCGA) is given below for convenience [50] The algorithm is as follows 


Step 1 Choose weight vector iTj and scalars 0<cr<10 '^0<^1<10 ^ X\=Q Set 
p, =7^ = —L (n, ) It = 1 and success=true 

Step 2 If success=true then calculate second order information 



Step 3 Scale 5^ + (a* - 


Step 4 If <0 then make the Hessian 


A. 




^k=-^k+\\Pk\ \=^k 


matrix positive 


definite 


Step 5 
Step 6 
Step 7 


Calculate the step size pj^ = P]/ = pi f Sj^ 

Calculate the comparison parameter A^, = 2(5^ )- £(« j. 

If Aj > 0 then a successful reduction in erroi can be made 



iAa+] +o^kPk 
rj+l=-E (l?A+l) 

% = 0 success = true 
Ifk mod N = 0 then restart algorithm 

Pk+l =^+l 

else 

hJ[ \ / Pk 

Pk = + PkPk 

If is.]^>Q 15 then reduce the scale parameter 
A^ = 0 25A/^ 
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else 

X/ =A^ 

mccess = false 

Step 8 If Ay!^ < 0 25 tlien increase the scale parameter Xj^ 

Step 9 If the steepest descent direction \ ^ 0 then set k 
and return iTyr as the desired minimum 

3 3 1 Self scaling scaled conjugate gradient algorithm 

When the non linearity is removed from the output layer the weights therein can be updated 
using a linear scheme In the linear schemes singular value decomposition can be applied to 
solve the output layer weights and then rest of the weights are updated using the usual BP 
schemes either first order or the second order Here we choose to update all the weights using 
the BP with SCGA algorithm and term it as self scaling scaled conjugate algorithm (SSCGA) 
Updating the weights m this manner avoids any premature saturation of the ANN moreover 
gives a better generalization which the mixed scheme can't do without regularization 
Moreover the outputs may be scaled in the higher range for the purpose of training which is 
not possible by first order learning algorithm 
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Chapter 4 


Preliminary Remarks 

There are some problems used by researchers as benchmark problems for testing the 
efficacy of ANN models and the leammg algorithms developed For example XOR or PARITY 
problems are considered as hard and static mapping/classificatioii problems In this work we test 
the compensatory ANN models m conjunction with self scaling scaled Conjugate Gradient 
algorithm and classify them in two categories 1) Functional mappmg 2) Classification. In 
functional mappmg we consider sm (x) sm (y) and another functional mappmg problem defined 
later m this chapter In classification category we considered XOR, Parity Spiral and Half adder 
truth problems In parity problem agam we considered four bit five bit and six bit parity 
problems 

FUNCTIONAL MAPPING 

1 Sm(x)sm(y) problem 

Sm(x)sm(y) is a general functional mappmg problem which is used by researchers to 
test the network capabilities The equation which descnbes the function is 

z[x y) = sm(x)sm(y) 4 1 

this function gets more complex when the norm of the mput vector (x y) grows We have 
generated a tramig set contammg of 2500 trammg patterns by varymg the values of x and y m 
the range (0 511 ) Fig 4 1 shows the sm(x)sm(y) mappmg 
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FIG 4 1 Sin(x) * Sin(y) Diagram 


2 Functional 

2 2 2 
z = exp(-(x + y)) + X y + l 0/(>' + x ) + exp(-sm(x) + cos(>')) + (xH-y) 

CLASSIFICATION 
XOR Problem 

The Exclusive - OR problem is the classic problem requirmg hidden umts The XOR 
problem as compared with other logic operations (AND OR, and their negates NAND NOR) 
is nonhnearly separable The training set for XOR problem is 
{(0 0 0 ) (0 1 1 ) (1 0 1 ) (1 1 0 )} 

The training set is precise representing a set where no measurement errors /noise occurs On 
several occasions the system gets trapped m a local minimum dependmg on the initial 
parameters of ANN This is true for first order and second order leammg algorithms The new 
models are trained for the XOR problem and are also tested on a testmg set to vahdate its 
eflScacy 





Parity Problem 

The N input panty problem has been a popular benchmark problem among researchers 
in ANN This problem consists of mapping an N bit wide binary number into its parity i e if 
the mput pattern consists of odd no of ones then the parity is one else it is 0 This is considered 
as the difficult problem because the patterns that are closer (using the Euchdean distance) m the 
sample space i e numbers that differ in only one bit require their answers to be different The 
XOR is a 2 input parity problem and the general solution for the parity problem is a group of 
XOR circuits This benchmark is considered as a perfect training set The trammg data for 4 bit 
parity problem is shown in table 4 1 The training data for 5 bit parity problem is given m 
Table 4 2 The trammg data for 6 bit parity problem is given m Table 4 3 


TABLE 4 1 4 - Bit Parity Problem 
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TABLE 4 2 5 -BIT PARITY PROBLEM 



































































































































































































TABLE 4 3 6 -BIT PARITY PROBLEM 
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HALF ADDER TRUTH PROBLEM 


This IS a relatively simple problem compare to parity problem where there are two 
mputs and two outputs This problem is a combination of two mput XOR & AND problems 
This should classify both XOR & AND Table 4 4 shows the training set 
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SPIRAL PROBLEM 

Fig 4 3 shows the spiral classification problem 



FIG 4 3 SPRIAL CLASSIFICATION PROBLEM 




Chapter 5 

Results and Discussion 


5 1 Preliminary Remarks 

In chapter 4 the various benchmark problems were discussed pertaining to functional 
mapping and the classification Any new neuron model developed must be tested on these 
benchmark problems to vahdate its eflScacy Here in this chapter we test the 
compensatory models discussed earlier m chapter 2 on the benchmark problems 
described m chapter 4 

5 2 Functional mapping 

This may mvolve mappmg fiom a lower dimension to a higher dimensional system or 
vice versa Essentially the capabihty of mappmg a function depends upon the neuron 
model and the architecture used In the foUowmg section we first test on sm(x)*sm(y) 
problem Throughout the text the error on origmal scale with number of iterations is 
presented 

5 2 1 sm(x)*sm(y) 

Here the solution to this problem has been attempted usmg compensatory models with 
the SSCGA leammg A comparative study of the STD with the SCGA learamg and 
compensatory modelO with the SSCGA leammg is also presented In Fig 5 la the 
convergence results for the STD with the SCGA and modelO with the SSCGA leammg is 
shown Here the std 6 6 1 refers to the standard architecture with 6 neurons m mput 
layer and 6 m the hidden layer and 1 m the output layer Here for the compensatory the 
modelO 6 neuron blocks have been taken The number of activation functions and the 
weights mvolved m the std 6 6 1 and the modelO are the same But it is obvious that the 
convergence m the case of STD is very slow with this architecture Increasmg the number 
of neurons m the hidden layer m the case of STD to 6 15 1 has improved the result 
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considerably but the number of neurons and the weights involved m this case has got 
doubled as compared to the modelO 

This is expected as the compensatory model is based on higher order neuron 
models as discussed in the earher chapters An early and better convergence can be 
achieved using the compensatory models This may be advantageous for some critical 
mdustrial processes where the decision has to be taken fast 

In Fig 5 lb the convergence behavior of various compensatory models are 
presented Here it can be observed that the performance of model6 is poor as compared 
to other models Also the performance of model2 and model4 are not as good as the 
performance of modelO modell modeB and models This may be attributed to the fact 
that the power and the multiphcation terms provide a bias towards higher order terms 
while the first order terms get less weightage and therefore the performance of models 
with power terms multipked become relatively sluggish Performance of modelO 
modell models and models are comparable In the case of modelO and modell the initial 
convergence is faster The plots of actual vs predicted value for the training set is not 
presented for the sake of convemence as this periodic function produces the same output 
over different periods and therefore then values overlap on the graph making it blurred 

5 21 Functional 

The equation of the ftmctional is described in chapter 4 Here the data for training is 
generated by varymg the values of x and y by 0 2S and 0 S respectively over hundred 
mtervals The trammg set consists of 83 data pomts while the testing set consists of 17 
data pomts The results of trammg and testmg have been presented m Figs 5 2-5 4 In 
Fig 5 2 the error convergence durmg trammg has been plotted against the number of 
iterations for the various compensatory models Agam it is obvious that the performance 
of model6 is poor as compared to other models This reinforces the results obtamed for 
the sm(x)*sm(y) problem Here model2 appears to work better as compared to other 
models This may be attributed to the initialization of the ANN parameters as m the latter 
section it may be seen that mdoel2 model4 and model6 perform httle sluggish as 
compared to the other compensatory models The performance of compensatory models 
with power terms need further mvestigation with the mitiahzation of network parameters 
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which may be set for the future work The performances of other models are comparable 
In Fig 5 3 predicted values against the actual values of the output have been plotted for 
the traimng set for the modelO and modell The observed error m training is less than 
0 004% for individual outputs This shows that network is able to learn the functional 
hidden m the data In Fig 5 4 predicted values against the actual values of the output 
have been plotted for the testmg sets for modelO and modell Here also m the prediction 
results error is less than 0 005% for mdmdual entry Results for the other models are not 
presented here as they are of the same nature as for the modelO and modell Even in the 
case of model6 which does not converge properly the error is relatively significant at the 
lower values of the output 

5 3 Classification Problems 

Here we test the compensatory neuron models on XOR, panty and half adder truth 
problems to vahdate then efficacy 

531 XOR Problem 

The XOR problem is the simplest classification problem and requnes a multi layer neural 
network to solve it In fact usmg standard backpropagation algorithm with the STD it 
may take around 1000 iterations to classify this set properly The second order learmng 
algorithms make the convergence fast It is found that it takes around 50 iterations to 
converge for the STD with 2 11 architecture with the SCGA learmng Here m Fig 5 5 
the error convergence for different compensatory models have been presented It may be 
observed that the convergence of the compensatory models mvolvmg power terms is poor 
as compared to others The convergence of the modelO and the modell is faster In the 
case of modelO and modell the convergence is achieved just m 5 6 iterations as it is 
evident from the Fig 5 5 In Fig 5 6 and Fig 5 7 increasmg the number of neuron blocks 
from 1 to 2 and 3 respectively seem to have adverse effect on the convergence of modelO 
and modell The performance of model2 model4 and model6 has improved drastically 
by mcreasmg the number of neuron blocks Again in Fig 5 8 the convergence of modelO 
and modell seem to improve These fluctuations may be attributed to the imtialization of 
the ANN parameters The performance of model5 m Figs 5 6 5 8 is consistently good 


and the convergence is fast too In Tables 5 1 and 5 2 prediction results for the test data, 
for modelO with SSCGA learning and STD with SCGA learning respectively are 
presented Here in STD 2 2 1 structure is taken while for modelO only 1 neuron block is 
taken It is evident that the modelO with approximately half the number of neurons as 
compared to that for the STD gives much better generalization However it has been 
observed that mcreasmg the number of neurons m hidden layer of the STD improves the 
result on testmg set Thus the performance of compensatory model appears to be superior 
to that of STD 


Table 5 1 Results for the testing data set for modelO 


Input 

Output 

Predicted output 

Error 

0 1 

09 

1 

0 91 

0 09 

08 

02 
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0 78 

0 22 
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02 
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0 30 

0 30 

08 

08 
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Table 5 St Results for the testing data set for STD 


Input 

Output 

Predicted output 

Error 

0 1 

09 

1 

104 

0 04 

08 

02 
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1 02 

0 02 

03 

02 

0 

1 05 

1 05 

08 

0 8 

1 
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0 01 
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5 3 2 Parity Problem 

The N parity problem has been used to compare different learning algbfithms and 
comparmg the epochs necessary to produce a perfect result (no misclassification) This 
problem will be explored here to test the different compensatory neuron models in 
conjimction with SSCGA leammg usmg the same number of epochs of learning 





5 3 21 4 Bit Parity Problem 

Figures 5 9 5 12 show the error convergence for the 4 bit parity problem In Fig 5 9 the 
error convergence for the different models using 1 neuron block is presented Here it is 
observed that none of the models converge Increasmg the number of neuron blocks to 2 
improves the convergence of all the models The initial convergence of modelO is faster 
as compared to other models The convergence of models and models seem to be better 
as compared to others In fact m the case of modelO and modell the final slow 
convergence reflect that these are trapped m local minima In spite of this trapp ing m 
local minimum, modelO has classified m just 40 epochs 

5 3 2 2 5 Bit Parity probleni 

The results for 5 bit parity problem are presented m Figs 5 13-5 16 From Fig 5 13 it is 
obvious that none of the models converges with one neuron block This is m accordance 
with the complexity of the problem As the complexity mcreases the number of neurons 
required also mcreases Therefore convergence for the one-neuron block is not expected 
In Fig 5 14 the convergence for different models are shown The plot for modelO is not 
mcluded as its behavior is almost the same as that of modell Also for the model6 
convergence m the case of 5 bit parity problem is not observed with 12 3 and 4 neuron 
blocks therefore it is excluded from the plots for the 5 bit parity problem However it 
may be noted that mcreasmg the number of neuron blocks improves the classifymg 
power of the model6 The model3 and the models converge faster as compared to other 
models Convergence m the case of modell is poor as compared to other models The 
performance of model2 improves with the mcreasmg number of neuron blocks 
5 3 2 3 6 Bit parity problem 

In Figs 5 17 5 20 the plots for the 6 bit parity problem are presented It is obvious firom 
aU these figures that the model6 fail to converge It is seen that as the complexity of the 
classification problem is mcreasmg it becomes more difficult for the model6 to classify it 
Also the performance of the model4 and the model2 is poor The performance of modell 
and model5 is the best Here modell shows some improvement with the mcreasmg 
number of neurons 
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TABLE 5 4 SUMMARY SHEET FOR CLASSIFICATION PROBLEM 



















































TABLE 5 5 SUMMARY SHEET FOR CLASSIFICATION PROBLEM 
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FOR THOSE CASES TRAINING WAS NOT DONE 




















































Chapter 6 
Cldsmg Comments 


6 1 Summary 

In this work seven compensatory models have been trained m conjunction with self 
scalmg scaled conjugate gradient algorithm (SSCGA) These models are tested on 
benchmarking problems The results for modelO with SSCGA are also compared with 
that of STD with scaled conjugate gradient algorithm (SCGA) It is found that the 
performance of the compensatory model is better than that of the STD Moreover a large 
computational savmg is achieved usmg the compensatory model Among the 
compensatory models the model3 and models performs better than other models The 
models mvolvmg the power terms of the output of the activation functions and their 
multiphcation have sluggish performance The compensatory models do not work on 
spiral problems 

6 2 Scope for future work 

Needless to say it may be worthwhile to pomt out that there is considerable scope for 
fiiture work Here we summarize some of the summations for achievmg the objective as 
mdicated below 

X The power models can be further explored for mitiahzation of the ANN 
parameters 

V The spiral problem may be attempted by mtroducmg one more layer of 
compensatory neurons 

♦ For aU the compensatory models the net to the activation fimction can be altered 
by s umming up the multiphcation and the summation term before feedmg it to the 
activation 

* One of the activation function of the neuron block can be dropped and the 
compensatory summation of the summation and multiphcation of weighted inputs 
may be taken 


From the results of 4 bit 5 bit and 6 bit parity problems it may be concluded that the 
performance of the summation models are better than that of the power models Among 
summation models model3 and models converge better than the modell for complex 
problems with the same number of neuron blocks As the complexity of problem mcrease 
the performance of modeB and modelS unproves 

5 3 2 4 Half adder truth problem 

The XOR problem is a special case of half adder tmth problem as discussed m chapter 4 
In Figs 5 21 5 24 the error convergence for the half adder truth problem is shown Here 
the convergence of modelO modell modeB and models is better as compared to the 
mdoel2 model4 and model6 for one and two neuron blocks But as the number of neuron 
blocks are mcreased the performance of modeI2 model4 and model6 unproves as 
observed earher for XOR and parity problems 

5 4 Concluding Remarks 

In this chapter the compensatory models were tested on various benchmark problems and 
it is found that the performances of the compensatory models are better than STD The 
generalization characteristics of compensatory models are much better than the STD The 
compensatory models are computationally more efficient and converge faster too Among 
the compensatory models the modeB and the models seem to work better for a range of 
complex problems The modelO and modell are comparable and show better performance 
as compared to model2 model4 and model6 

For the spiral problem has been observed that the STD converges but the 
compensatory models do not converge This is expected for the problems where there is 
overlappmg of different classes and thereby the proximity of different classes m the 
phase space The compensatory models bemg higher order models create a higher order 
nonhnear boundary as compared to the STD which creates simple boundary Therefore 
the reparabihty m such cases usmg the compensatory models becomes difficult 






FIG 5 la A comparison of error convergence for the STD architectures and the 
compensatory modelO for sm(x)*sm(y) problem 





FIG 5 lb Convergence error for sin(x)'^sm(y) problem on training set for 
compensatory models using SSCGA learmng 
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FIG 5 2 Error convergence for the functional dunng training 
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FIG 5 SaCompansonof Actual Vs Predicted values obtained using modelO with SSCGA learning 












FIG 5 6 Error plot of XOR problem for CNN A with 2 Neurons 
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FIG 5 7 Error plot of XOR Problem for CNNA with 3 Neurons 




FIG 5 8 Error Plot of XOR problem for CNNA with 4 Neurons 
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FIG 5 9 Error plot of 4-bit parity problem for CNNA with one Neuron 
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FIG 5 10 Error plot of 4 - bit Parity problem for CNNA with 2 Neurons 
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FIG 5 11 Error plot of 4 - bit Parity problem for CNN A with 3 Neurons 





ITERATIONS 

FIG 5 12 Error plot of 4 - bit Panty problem for CNNA with 4 neurons 
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FIG 5 15 Error plot of 5 - bit Panty problem for CNNA with 3 
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FIG 5 17 Error plot of 6 - bit Parity problem for CNNA with 1 neuron 
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FIG 5 19 Error plot of 6 - bit Panty problem for CNNA with 3 neurons 













FIG 5 23 Error plot of Half adder truth problem for CNNA with 3 neurons 
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FIG 5 24 Error plot of Half adder truth problem for CNNA with 4 neurons 
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